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(54) METHOD FOR INFERRING PROTBN FUNCTIONS WITH THE USE OF UGAND DATA BASE 

(57) A method of predicting biological functions of 
query proteins whose steric structures are known or 
predictable, using a three-dimensional structure data- 
base storing bio-active compounds which bind to target 
proteins with known biological functions, which com- 
prises the steps of: 



(1) extracting bio-active compounds capable of 
binding to said query protein as ligand candidates 
from said database based on the capability of com- 
plex formation between the query protein and bk)- 
active compounds; and 

(2) predicting that biological functions of the query 
protein are identical or similar to the biological func- 
tions of the target protein to which saki ligand can- 
didates bind. 
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Description 

Technical Reld 

5 [0001] TTie present invention relates to prediction methods of the protein functions and databases used for said 
inethods. 

Background Art 

10 [0002] Proteins are biopolymers conprising 20 kinds of amino acids as building blocks and have structures in which 
about 50 to 1 .000 amino acids are connected in a chain by peptide bonds (-CONH-). The existence of various kinds of 
proteins has been revealed such as enzymes which catalyze substance conversion in organism, receptors related to 
their inter or intracellular signal transduction, receptors related to the control of gene expression, cytokines which are 
secreted at the time of inflammation, proteins related to the transport of substances and others. In the organisms of 

75 higher animals such as human, there are 50 to 100 thousands of kinds of proteins, and each plays specifc functions 
and roles. 

[0003] Enzymes provide fields for chemk^al reactions in which specific products are obtained by the actions on spe- 
cific substrates, and proceed stereospecific or regiospedf ic reactions with moderate conditions. Receptors transduce 
signals through the structural change upon the binding of hormones and signal transmitters. The features common to 

20 these enzymes and receptors are the appearance of their biobgical functions by forming stable complexes with specific 
molecules (ligands). Protein molecules, which are long like strings, are foMed to take certain steric structures and form 
structural sites (ligand binding sites) which bind specifically with artificial molecules such as drugs and specific biomol- 
ecules. This ligand binding site is essential for the appearances of the functions of enzymes and receptors. 
[0004] The steric structures of proteins can be determined by X-ray crystallographic analysis and NMR analysis. 

25 Due to the remarkable progress and spread of these analytical techniques, determinat'on of steric structures of proteins 
has become easy, and the number of proteins analyzed is increasing acceleratingly Protein Data Bank, which is a data- 
base of protein structures, stores three-dimensional coordinates of more than 7,000 proteins at present, and the data 
are available throughout the worid. Accordingly, once functions of a protein are known, it has become possible to under- 
stand the relations between the structure and the function of the protein on atomic levels by analyzing the crystal struc- 

30 ture of the complexes with appropriate ligands. Moreover, by using the steric structures of proteins which have been 
analyzed aystallographically as tenrplates. and by substituting the side chains of amino adds, it has become possible 
to predict the steric structure of a protein having highly homologous amino add sequences (homology modeling). 
[0005] Protein studies have so far been conducted by the means in which after the separation and purification of 
proteins enploying its biological function as a guide, its amino add sequence Is determined to analyze the structure and 

35 function. However, recently, as analyses of genes have become easy, there are cases in which the existence of a pro- 
tein is suggested from genetic information. For exanf^le. the existence of conskierable number of proteins has been 
revealed by a large-scale project aiming at the human genome analysis, and these results are expected to be utilized 
for the elucidation of the cause of diseases and drug design. 

[0006] However, for those proteins successively found from genome analysis studies, their amino acM sequences 
40 are merely elucidated, while in most cases their biological functions cannot be predicted at ail. For this reason, an enor- 
mous amount of study is necessary to predict or confirm functions for each protein, which becomes an obstacle for the 
effective use of genome infbrmation. Moreover, although the steric structure of proteins whose amino acM sequences 
have been elucidated can be determined more easily than before due to the progress of crystallographic analysis and 
NMR analysis, there are many cases in which the functions are hardly known even though the steric structures of pro- 
45 teins have been eluddated. 

[0007] At present, methods of predicting the functions of novel proteins easily have not been established. For 
example, a prediction method is adopted in which a novel protein is predicted to have functions similar to a known pro- 
tein, if a protein with high homology is found by comparing the amino acid sequence of the novel protein with groif)S of 
amino add sequences of proteins with known functions. Furthernwre. for the multiple proteins with the same functions, 
so information concerning the correlation between the structure and f unctton can be obtained by making alignment so that 
homologous parts become as large as possible. However, even for proteins with the same function, the homology is not 
so high in general when the biological species are different. Thus, the above-mentioned methods which depend on 
alignment are not helpful at all for many proteins whose functions are known to be the same or not. 

55 Disclosure of Invention 

[0008] An object of the present invention is to provide methocte of predicting functions of proteins. More spedf ically. 
the ot)ject of the present invention is to provide methods to predict easily the functions and roles in organism, for pro- 
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teins whose steric structures are known or predictabia Moreover, another subject of the present invention is to provide 
a database which is helpful for exploring shapes and properties of ligand binding site of proteins from the side of bio- 
active compounds (ligands). 

100091 As a result of zealous endeavor to solve above-mentioned subjects, the inventors of the present invention 
5 found that the functions of proteins without known functions can be predicted with good accuracy by preparing a three- 
dimensional structure database which stores bio-active compounds capable of binding as ligands with target proteins 
with known biological functions, judging capability of complex formation between the proteins wittx>ut known functions 
and each bio-active compound in the database, and selecting bio-active compounds with high capability of complex for- 
mation as ligand candidates. The present invention was achieved based on these findings. 
10 [0010] The present invention thus provides methods of predicting biological functions of query proteins whose 
steric structures are known or predictable, using a three-dimensional structure database which stores one or more bfo- 
acdve compounds which bind to target proteins with known biological functions, which conprises the steps of: 

(1) extracting bio-active compounds capable of binding to said query protein from said database as ligand candi- 
15 dates, based on the capability of complex formation between the query protein and the bio-active compounds; and 

(2) predicting that ttological functions of the query protein are identical or similar to the biological functions of the 
target proteins to which said ligand candidates bind. According to a preferred embodiment of the present invention, 
the above-mentioned method comprises the steps of: 

(3) extracting one or more ligand binding sites for the query protein; 

20 (4) exploring the most stable complex formed with the ligand binding sites of the query protein for each bio-active 
compound included in the database; 

(5) extracting bio-active compounds which satisfy hit conditions preset, based on the stabilities and stmctural fea- 
tures of the most stable conplexes; 

(6) extracting further, as required, bio-active compounds from the bio-active compounds extracted in step (5) which 
25 satisfy hit conditions different from those in the above-mentioned step (5); and 

(7) predicting that biological functions of the query protein are identical or similar to tiie biological functions of ttie 
target protein to which said ligand candidates bind, while treating the bio-active compounds extracted in above- 
mentioned steps (5) or (6) as ligand candidates. 

30 [001 1 ] In further preferable methods of the present invention, above-mentioned steps (4) through (6) are performed 
automatically using the program ADAM&EVE {PCT/JP95A)2219: W096/13785). According to otiier embodiments of 
the present invention, there are provided a method of predicting biological functions of query proteins using a tfiree- 
dimensional database which stores one or more bio-active compounds which bind to target proteins with known biolog- 
ical functions; a method of predicting biological functions of query proteins by exploring the shapes and properties of 

35 the ligand binding sites of the query protein using one or more bio-active compounds which bind to target proteins witti 
known biological functions which are stored in a three-dimensional database; and a method of predicting functions of 
query proteins by extracting ligand candidates for tiie query protein from a three-dimensional database which stores 
one or more bio-active compounds which bind to target proteins with known biological functions. 
[001 2] According to still ottier entediment of tiie present invention, there is provided a three-dimensional database 

40 Which stores one or more bio-active compounds which bind to target proteins witti known biological functions and is 
used for each of the above-mentioned methods. According a preferred embodiment of the present invention, there is 
provided a database including information about the target protein for each bio-active compound, and in a furttier pre- 
ferred embodiment, the above-mentioned database is prepared in a form which enables to perform the above-men- 
tioned steps (4) through (6) automatically using the program ADAM&EVE (in ttie specification, the database is 

45 sometimes referred to as "ADAM-style database"). 

[001 3] From other points of view, the present invention provides a method of predicting biologfoal functions of query 
proteins whose steric structures are toiown or predictable, using a three-dimensional database which stores one or 
more intrinsic bio-active compounds with known bio-activities in organisms but without the knowledge of the target pro- 
tein, which comprises ttie steps of: 



(1) extracting bio-active compounds capable of binding to said query protein from said database as ligand candi- 
dates, based on the capability of complex formation between the query prot«n and bto-active compounds; and 
(8) predicting that biological functions of the query protein concern tiie bio-activities of said ligand. 

55 [0014] According to a preferred embodiment of this method, there is provided the above-mentioned method com- 
prising steps of: 

(3) exbBCting one or more ligand binding sites for tiie query protein; 
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(4) exploring the most stable complex formed with the ligand binding sites of the query protein for each bio-active 
compound included in the database; 

(5) extracting txo-active compounds which satisfy hit conditions preset, based on the stabilities and structural fea- 
tures of the most stable conplexes; 

5 (6) further extracting, as required, bio^cGve compounds from the bio-active compounds extracted in step (5) which 
satisfy hit conditions different from those in the above-mentioned step (5); and 

(9) predicting that biological functions of the query protein ooncem the bio-activity of the ligand, while treating the 
bio-active compounds capable of conplex formation extracted in steps (5) or (6) as ligand candidates. 

10 [001 5] As a more prefenred embodiment, there is provided the above-mentioned method in which steps (4) through 
(6) are performed automatfoally using the program ADAM&EVE. 

[0016] Furthermore, there are provided by the present invention, a method of predicting biological functions of 
query proteins whose steric structures are known or predictable using a three-dimensional database which stores one 
or more intrinsic bio-active compounds with known bio-activities in organisms but without the knowledge of the target 

IS protein; a method of predicting biological functions of query proteins by exploring the shapes and properties of the lig- 
and binding site of the query protein using one or more intrinsic bio-active compounds with known bio-activities in 
organisnr^ but without the knowledge of the target protein stored in a three-dimensional database; and a method of pre- 
dicting biological functions of query proteins by extracting ligand candidates for the query protein from a three-dimen- 
sional database which stores one or more intrinsic bfo-active compounds with known bio^ctivities in organisms but 

20 wittiout the knowledge of the target protein. 

[001 7] In addition, the present invention provides a three-dimensional database which stores one or more intrinsic 
bio-active compounds with known bio-activities in organisms but without the knowledge of the target protan. and which 
is used for the above-mentioned metiiods. According to a preferred embodiment of the present invention, there is pro- 
vided a database including ttie information about bio-activity of each bio-active compound, and according to a more 

25 prefen-ed embodiment, tiie above-mentioned database is prepared in a form which enables to perform the above-men- 
tfoned steps (4) through (6) automatically using the program ADAM&EVE. 

Brief Explanation of Drawing 

30 [0018] 

Figure 1 shows the three-dimensional structure and ligand binding sites of bovine trypsin. 

Figure 2 shows the binding mode of nafamostat extracted from the database as a ligand candidate. 

35 Most Preferred Embodiment for Carrying Out the Invention 

[Preparation of Database] 

[001 9] For carrying out the metiiods of the present invention, it is preferable to prepare in advance a three-dimen- 
40 sional database which stores bio-active compounds which bind as iigands to target proteins with known biological func- 
ttons. The kinds of bio-active compounds are not particularly limited, for example, various bio-active substances which 
exist in organism, for example, transmitters (receptor substrate), enzyme substrates and enzyme products, vitamins, 
hormones, autacoids, co-enzymes, amino acids, bio-active peptides, nucleotides, nx>no6accharides in glycolytic path- 
way, or organic acids, as well as medtoinal molecules, enzyme inhibitors, or toxins, which do not originally exist in organ- 
45 ism, may be acceptable. Moreover, not only sufc)stances with low molecular weight but also compounds with high 
molecular weight such as proteins, nucleic acids like RNA or DNA, or polysaccharides may be acceptable. 
[0020] In order to increase tiie accuracy of prediction of the methods of the present invention, it is desirable to store 
a lot of bio-active compounds in tiie above-mentioned database so that diverse biological functions are covered. It is 
also desirable to store as many bio-active conpounds as possible with various molecular skeletons for each bio-activity. 
so although it is sufficient to store at least one typical bio-active compound for one bio-activity in the database. Further- 
more, one may prepare more than two kinds of appropriate databases, and select a desirable database and use it for 
the methods of tiie present invention. 

[0021] For each bio-active confound stored in the database, it is desirable to store additional information such as 
information about the structure of compounds; information about the binding with target proteins; information about the 
55 biological functions of target proteins; information about tiie bio-activity of the compounds in case the target protein is 
unknown. Examples of ttiese information include one or more kinds of information selected from tiie following group, 
comprising: name of tiie compound; number of constituting atoms and molecular weight; element name of each atom; 
two-dimensional coordinates and three-dimensional coordinates atom types for force-^iekl calculation; atomic charge; 
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bonding relations: modeling method; conformation role; bio-activity; name of target protein; subunit or domain; function 
classification biological species Unding constant or si^type specifteity; and steric structure information. 
[0022] In the database of the present invention, it is preferable that all of these are addoJ as information. Howa^er. 
information about bio-active substances is not limited to the above-mentioned items, and one or more of the items rrey 

5 be sub^uted by other infonnation. Furthennore, other infonration may be added to the above-mentioned information 
as required. Concerning these information, it is not always necessary to store them in a angle database, and it is 
acceptable as long as some relationships are retained, for example, by including tag infomnation which points to record 
or data In each database. In the following, each informafon is explained more specifically. However, it should be under- 
stood that these are explained as examples and persons skilled in the art can appropriately select them. 

10 [0023] As "compound name", any names such as common name, trade name, development code, lUPAC nomen- 
clature may be usai so fong as it can identify the bio-active compound. "Number of constituting atoms" is the number 
of each constituting element included in the bio-active compound, which may be expressed, for example, like 
C24H20O2. For "three-dimensional coordinates", those expressed by orthogonal axes (x, y. z) in angstrom unit are pref- 
erable. "Atom types for force-field calculation" mean symbols or numbers for further classifying elements based on 

IS orbital hybridization and the like, which are used for calculating force-field energy "Atomic charge" is the formal charge 
assigned on each atom to calculate electrostatic interaction energy in the force-field energy and tending relation" is 
the infonnation showing which atom forms a covalent bond with which number atom in the molecule, and how many the 
order of the bond is. 

[0024] "Modeling method" is an Information which indicates the origin of the three-dimensional structure of bio- 
20 active compounds, and includes information such as whether tiie three^imensional structure is derived from a sole 
crystal structure or whether the three-dimensfonal structure is predicted from the two-dimensional structure using a pro- 
gram which convert to three dimensions. "Conformation" indicates information such as whether the conformation of the 
three-dimensional structure is one of the local-minimum structures obtained from the tiiree-dimensional conversion, a 
sole crystal structure, or a structure determined by NMR, and whether or not the conformatton is active conformation. 
25 "Role" is an information which shows that the bio-active compound acts as, for the target protein, which of enzyme sub- 
strate, enzyme reaction product, enzyme inhibitor, co-enzyme, effector, intrinsic ligand. agonist antagonist, reenter 
substrate and the like. "Bio-activity" is an information about the change caused In organism upon administration of the 
bio-active compound. 

[0025] For "target protein name", it is preferable to adopt those which generally Include the functions of the protein. 

30 for example. dihydrolWate reductase, retinoid receptor and others. "Subunit or domain" is an information which indi- 
cates the subunit or domain to which bio-active compounds bind when the target protein consists of multiple subunits 
or domains. "Function classification" means a broad classification of function of the target protein in organism, which is 
exemplified by information including dassrfication such as enzyme, trans-membrane receptor, nuclear receptor, 
cytokine, and ti-ansporter protein. 

35 [0026] Information of "biological species" includes information about the bfological species from which the target 
protein is derived. It is usually specified by taxonomy by species, genus, family, class and the like. More practically, one 
may use classification such that 1 for all biological species, 2 for higher animals, 3 for lower animals, 4 for prx)karyotes. 
and 5 for plants. "Tissue" may at least include information about tissues where the target protein mainly exist and is 
functioning, which is exemplified by tissue names such as Wood, liver and ottiers for the case of human species. "Bind- 

40 irig constant and sub-type specificity" may include information such as binding constant, IC50. sub-type ^ecificity of 
binding. Infonnation about "steric structure" includes information whether three-dimensional structure of the target pro- 
tein is known or not, and it is desirable to include information about whether tiie analytical method is crystallographlc 
analysis or NMR analysis when three-dimensional stmcture is known and to include ttie code number In the Protein 
Data Bank if the structure is available tiierefrom. 

45 

[Extraction of Ligand Binding Site on Query Protein] 

[0027] Concerning query proteins for which biological functions are to be pralicted, tiiere is no limitation about their 
kinds or sizes as long as their steric structures are known or predictable. For example, proteins consisting of multiple 
50 subunits or conjugated proteins like glycoprotein may be acceptable. If the three-dimensional structure analysis has 
been performed for tiie query protein by aystaliographlc or NMR method, data on the steric structure can be used 
directiy. Alternatively, the steric structure may be predicted by homology modeling method and the like, using steric 
structures of homologous proteins as templates. 

[0028] In order to judge capability of complex formation between the query protein and bio-active compounds in the 
55 database, one or more sites in the query protein molecule are extracted as candidates for ttie ligand binding sites. Gen- 
erally, this step may be performed interactively by rotating tiie query protein molecule on computer graphics display and 
judging visually ttie sizes and d^ttis of the sites like a pocket or a cavity on ttie molecular surface ttiat have character- 
istic shapes and properties to be ligand binding sites. Alternatively, it is also possMe to explore these sites automati- 
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caliy. If more than two candidate sites are found on the nfK)lecular surface of the query protein, the following exploration 
steps may be performed regarding each as a llgand binding site. 

(Exploration of Stable Complex Fomied between Blo^ctive Compound and Query Protein] 

5 

[0029] Judgment of capability of complex formation is conducts between one or more llgand binding sites found 
in the query protein and each bio-active compound stored in the database. Capability of complex formation may be 
judged, for example, bas&l on the stability (such as low energy value) and structural features of the complex after form- 
ing one or more complexes by binding one of the bio-active compounds stored in the database to the llgand binding site 

10 on the query protein. In order to explore multiple stable complexes effectively which are formed between the bio-active 
compounds and the llgand binding site of the query protein, a simulation called docking study may be utilized. 
[0030] This method generally includes a process of displaying the ligand binding sites of a protein with a known 
structure on computer graphics display, and a process of exploring locations for stable binding by rotating and translat- 
ing the molecule to be bound, with these processes usually conducted interactively. For the molecules with flexible con- 

15 formation which have rotataUe bonds, it is preferable to include a process of exploring stable locations while varying the 
conformation. After obtaining several locations which may lead to stable binding, it is possible to predict the most stable 
structure of the complex by performing energy calculation and optimization as required. 

[0031 ] As a programs for the docking study, a program developed by Tomioka and others (GREEN) may be suitably 
employed, for example (Tomioka, N. and Itai. A., J. Comput. -Aided Mol. Design, 8, pp.347-366, 1994). However, since 

20 a freedom concerning the rotation and translation of molecules and a freedom of conformation are coupled together, 
there are cases in which the above-mentioned interactive method is not sufficient for predicting the most stable struc- 
. ture with comprehension of all possible combinations. As a method of exploring the most stable structure of complexes 
while solving such problems, a program by Mizutani and others (ADAlVi), which performs docking automatically, can be 
suitably employed(Mizutani. YM. et al., J. Mol. Biol.. 243. pp.310-326, 1994; US Patent No. 5,642,292; 

25 PCT/JP93/0365). 

[0032] When the program ADAM is employ^, it is possible to explore several to several dozens of stable complex 
structures including the structure of the most stable complex effectively out of tremendous amount of complex struc- 
tures resulting from the freedoms of binding mode and confonnation, and it is possible to output automatically the com- 
plex structures obtained from the exploration, sorted In an order of tiieir stabilities and other indices. The program 
30 ADAM, whose characteristics is high reliability and accuracy, includes a process of structure optimization witfi location 
and torsion angles variaj continuously by means of repeated energy minimization, which is conducted after conpre- 
hension of approximate possibility of bonding mode and ligand conformation based on the geometrical condition of 
hydrogen bond formation. 

[0033] In order to pr^ict complex structures using the program ADAM, it is generally necessary to specify atom- 
35 type number and atomic charge of each atom of the bio-active compound, which is used for force-field energy calcula- 
tion, classification number of hydrogen bonding functional groups for heteroatoms. initial value, final value, and incre- 
ment value of torsion angle for rotatable bonds, which is used for generation of conformation, as well as the three- 
dimensional coordinates of the query protein and the bio-active compound. These parameters can be input interactively 
on computer graphics display when a bio-acHve compound included in the database is processed one by one using the 
40 program ADAM. 

[Extraction of Bio-active Compounds to be Ligand Candidates] 

[0034] By evaluating capability of complex formation between each bio-active conpound and the query protein, it 
45 is possible to extract compounds that can bind to the query protein stably as llgand candidates, out of tiie bio-active 
compounds stored in the database. In the most preferable en^iment tiie above-mentioned exploration process of 
complex structures and extraction process of ligand candidates may be conducted automatically in a consecutive proc- 
ess using tiie program ADAM&EVE (PCT/JP95/02219:WO96/13785). 

[0035] When the program ADAM&EVE is employed, only a complex which is most stable (most stable conplex) is 
so explored automatically out of the conrplexes formed with the query protein, for each of the diverse and a lot of numbers 
of bio-active compounds stored in tiie database. After that, a judgement is given to that most stable complex whether it 
satisfies the criteria of selection (hit conditions) preset, and then one or more bio-active compounds satisfying the cri- 
teria are extracted as ligand candidates. As for hit conditions, parameters regarding tiie stabilities of the complexes 
(energy values) and regarding the structural features may be generally adopted. For example, value of intermolecular 
55 Interaction energy, number of hydrogen bonds, molecular weight, number of atoms, number of rings, ionic bonding or 
hydrogen bonding with specific functional groups in the proteins may be specified arbitrarily 
[0036] When the exploration process of complex structures and extraction process of llgand candidates are con- 
ducted by the program ADAM&EVE, it is desirable to include in ttie database, coordinates of hydrogen atoms, atom- 
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type number and atoniic charge for each atom of bio-active compound, which is used for the fdrce^ield energy calcula- 
tion, classification number of hydrogen-bonding functional groups fbr heteroatoms. rotatable bond and the infbrmation 
on their rotation (initial value, final value, increment value of torsion angle) and the like, as well as the three<fimensional 
coordinates of the query protein and the bio-active compound, so that diverse and many numbers of bio-active com- 

5 pounds stored in the database can be processed automatically. A database which includes these infbrmation and suit- 
able for the program ADAM&EVE is particularly preferalrfe embodiment of the present invention. 
[0037] By using ordinary three-dimensional databases which include infbrmation about element name, three- 
dimensional coordinates and bonding relation fbr each constituting atom of the bio-active compounds, it is possible to 
prepare above-mentioned preferred database suitable fbr the program ADAM&EVE (ADAM-style database). Since the 

10 preparation of ADAM-style database is described in detail, for example, in PCT Intemational Publication W096/13785, 
those skilled in the art can easily prepare the database following that procedures or with proper modification and alter- 
ation as required. F=br example, it is possible to assign above-mentioned infbrmation automatically after reading tiie 
ordinary ttiree-dimensional structure database. If said database does not include ttie infbrmation on three-dimensional 
coordinates of hydrogen atoms, hydrogen atoms need to be added automatically by calculating their expected position 

15 for predicting tiie most stable stmcture oorrectiy. if the position of a hydrogen atom cannot be predicted due to a bond 
rotation, it is desirable to place tiie hydrogen atom at an extended position in trans form. 

[0038] As a preferable metiiod of preparation of tiie database and addition of tiie above-mentioned information, an 
example includes a method in which chemical stiuctures are input by using tiie ISIS program of MDL company, which 
Is used as a standard for managing market compounds and inhouse compounds, a database is prepared in ttie form of 
20 two-dimensional Motfile of MDL company, structures are tiansformed automatically to three-dimension by a three- 
dimensional conversion program, and ttien above-mentioned information is assigned automatically. However, tiie data- 
base of ttie present invention is not limited to such prepared witii tills method. 

[0039] By selecting hit conditions used in the extraction process of ligand candidates appropriately, it is possible to 
confrol tiie number of tiie ligand candidates to be extracted. In order to perform the extraction of ligand candidates rap- 
25 idly and accurately, it is preferable to conduct tiie extraction process witti more tiian two steps of operation. For exam- 
ple, at tiie first extraction step, all bio-active compounds witii possibility to be a ligand candidate are extracted by 
applying relatively moderate hit conditions, and at the next extraction step, the most probable one or more most stable 
complexes can be selected by setting more strict hit condition based on the energy of ttie complex, number of hydrogen 
bonds, and otiier information. 



[Prediction of Function of Query Protein] 

[0040] Bio-active compounds constituting tiie most stable complexes that satisfied ttie hit conditions (ligand candi- 
dates) are capable of binding stably to tiie query protein as ligands. That is, tiie query protein possesses a ligand bind- 
35 ing site identical or analogous to the target protein to which the ligand candidates bind, and accordingly, it is highly 
probable tiiat ttie query protein and said target protein have identical or analogous biological fonctions. It can be also 
predicted tiiat ttie role of said bio-active compounds to the query protein is identical or analogous to tiie role to tiie tar- 
get protein (for exanple. a role of enzyme substrate, receptor substrate and the like). If tiie target protein is identical for 
several extracted ligand candidates witii different chemical structures, tiie above-mentioned prediction result is highly 



[0041 ] For example, if retinoic add is extracted as a ligand candidate from a database containing various bio-active 
compounds. It can be predicted tiiat tiie query protein has a function as retinoid receptor and tiiat retinoic add has a 
role as an agonist or antagonist to the query protein. Even if identity or analogy to specific target protein cannot be pre- 
dicted, tiiere is a possibility of predicting tiie functions of query protein. Fbr example, if a bio-active compound like co- 

45 enzyme NADPH which can bind to various biopolymers is extracted as a ligand candidate, it can be predicted ttiat ttie 
query protein has either function of oxidation-reduction enzyme utilizing NADPH as co-enzyme or function of enzyme 
or receptor regulated by NADPH. In ottier case, if an intrinsic bio-active compound wKh known bio-activities in organ- 
isms but wrttiout ttie knowledge of ttie target protein is extracted as a ligand candidate, it is probable ttiat tiie query pro- 
tein is a novel receptor or enzyme to which the bio-active compound act as an intrinsic ligand. 

50 [0042] As an example of preferred embodiment of prediction methods of ttie present invention, practical operating 
procedures using the program ADAM&EVE are shown on the folfowing schema However, the metiiods of the present 
invention are not limited to tiie following methods. 
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input protein structure without lawm 
function and \Mhose structure ie 
known or piedictBd (query protein) 



Oalabase of faitrindc iqands and 
tMogicafly active compounds 
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Onoomputer 
grapNcs displ^f 



Extract igand bindtr^ site 



ADAM-styte datalwse of ^ 
i)io-actlve compounds 



Eachoon^XKmd 

adam&eveI 



IS 



AreMoondWons 

SBuSnQOr 



Structure of the most Stable complex 
No 



20 



Am 



Yes 

Register as a figand candidate 
No 



a^oornpocindis 

mossed? I Yes 



1 or more than 2 igand candidates 



2S 



Ust of scores ^ each items of htt 
conditions) tor each Igand candidate 



(fiMmsetocUon) 
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identify target proteins oorresponding 
to the above Dgands 



Functions of the query protein 



35 

[0043] Following the above-mentioned scheme, each step is explained. 

1 . Select intrinsic ligand compounds in organism such as enzyme substrate, enzymatic producte, co-enzymes, sig- 
40 nal transducing substances, and hormones, and bio-active compounds whose target proteins are known. Make a 

database of the present invention by inputting compound names, two-dimensional structures, and other informa- 
tion. Furthermore, input information about the target protein for each of the bio-active compounds. 

2. Convert two dimensional structures in the aboventientioned database to ttiree-dimensional structures and cre- 
ate ADAM-style database by adding necessary data automatically. 

45 3. Input three-dimensional structure of a query protein. 

4. Specify one or more ligand binding sites (candidate sites) interactively on computer graphics display, and calcu- 
late information about three-dimensional grid points, hydrogen bonding, and dummy atoms which is necessary for 
the calculation by the program ADAM&EVE. 

5. Set hit conditions. 

so 6. Select one bio-active compound from the database. 

7. Predict tiie structure of ttie most stable complex between said bio^ctive compound and the query protein. 

8. Judge whether the structure of tiie most stable complex described above satisfies the hit conditions. 

9. If the hit conditions are satisfied, add said bio-active compound to a ligand candidate group (first extraction 
group) as a hit and keep its coordinate data and others. 

55 IO.G0 back to step 6, predict structures of the most stable complexes for other bio-active compounds, arxi repeat 
steps 6 through 9 until no more bio-active compound remains to be processed. 

11. Concerning the bio-active compounds included in the ligand candidate group (first extraction group), output a 
list containing the number of compounds, energy value at each conplex structure, the number of hydrogen bonds 
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and others. 

12.Reduce the number of bio-active conpounds Included in the ligand candidate group to a moderate number. As 
methods for this selection, employ either one of the following methods or oomblnation of more than two methods 
selected from the followings: a method to select specified numbers of compounds based on ranking; a method to 
select with more strict hit condition; a method to select interactively on computer graphics display; a method to 
select with hit condition set by different physical or chemical properties or different computational procedures; and 
others. 

13.Select finally a small nuntbers of ligand candidates. It Is desirable to inspect structure of the complex for each 
ligand candidates on computer graphics display 

U.Output the classification and biological function of the target protein for each ligand candidate from the data- 
base. 

15.Predict one or more biological functions for the query protein. 
Example 

[0044] An example is provided below to desaibe the present invention more specifically However, the scope of the 
present invention is not limited to the example below. 

Example 1 

[0045] We constructed a small database including bio-active compounds shown in Table 1 . and explored capability 
of binding to a protein witti known three-dimensional structure for each bio-active compound contained in the database. 
Although metiiods of the present invention can be applied in principle to query proteins without known functions, we 
used bovine trypsin as a query protein assuming ttiat its function is unknown, and investigated whether or not "nafamo- 
stat", which is a trypsin inhibitor, is selected as a ligand candidate. The three^imensional structure of bovine trypsin 
and its ligand binding site are shown in figure 1 . 



Table 1 



Bio-active compound 


Target biopolymers 


Methotrexate 


Oihydrofolate reductase 


Retinoic add 


Retinoid receptor 


Nafamostat 


Trypsin 


Indomethadn 


Cyclooxygenase 


Donepezil (E2020) 


Acetylcholinesterase 


Phorbol ester 


Protein kinase C 


Morphine 


Opioid receptor 


Estradiol 


Estrogen receptor 



[0046] As a result of exploration of the database, na^mostat was selected as a ligand candidate and the compound 
was shown to bind to the query protein stably The binding mode of nafamostat to ttie ligand-binding site is shown in 
Rgure 2. Indometiiacin was predicted to have possibility to form a complex, albeit a rather unstable one, and all other 
compounds was judged not having capability of complex formation (Table 2). From the result of tiiis exploration, ttie 
function of ttie query protein was predicted to be identical or similar to trypsin, which is ttie target protein of nafamostat 



Table 2 



Bio-active compound 


Intermolecular Interac- 


Number of intermolecu- 




tion {Kcai/mol) 


lar hydrogen bonds 


Mettiotrexate 


NA 


NA 


Retinoic acid 


NA 


NA 
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Table 2 foontinued^ 


• 


Bicvacth/d GOtnoaitrtfi 


II 1 ICl 1 1 K/l6wUICII It lltSi du 


Kli imhor iiiitomrmla/*i t. 

iNurnuci Ui iiucriTiuicvU 




tion (Kcal/mol) 


lar hydrogen bonds 


Nafamostat 


•39.6 


5 


Indomethadn 


-29.9 


2 


Don^ezil (E2020) 


NA 


NA 


Phoitol ester 


NA 


NA 


Morphine 


NA 


NA 


Estradiol 


NA 


NA 



Industrial applicability 

[0047] According to the methods of the present invention, functions of protein without known functions can be pre- 
dicted rapidly and accurately The database of the present invention is useful for conducting the above methods effi- 
ciently 

Claims 

1. A method of predicting biological functions of query proteins whose steric structures are known or predictable, 
using a three-dimensional structure database storing one or more bio-active compounds which bind to target pro- 
teins with known biological functions, which comprises the steps of: 

(1) extracting bio-active compounds capable of binding to said query protein as ligand candidates from said 
database based on the capability of complex formation between the query protein and the bio-acSve com- 
pounds; and 

(2) predicting that biological functions of the query protein are identical or similar to the biological functions of 
the target protein to which said ligand candidates bind. 

2. TTie method according to claim 1 , which further comprises the steps of: 

(3) extracting one or more ligand binding sites for the query protein; 

(4) exploring the most stable complex formed with the ligand binding site of the query protein for each bio- 
active conpound included in the database: 

(5) extracting bio-active compounds which satisfy hit conditions preset, based on the stabilities and structural 
features of the most stable complex; 

(6) extracting further, as required, bio-active compounds from the bio-active compounds extracted in step (5) 
which satisfy hit conditions different from those in the above-mentioned step (5); and 

(7) predicting that biological functions of the query protein are the identical or similar to the biological functions 
of the target protein to which said ligand candidates bind, while treating the bio-active compounds capable of 
complex formation extracted in steps (5) or (6) as ligand candidates. 

3. The mettiod according to claim 2, wherein the above-mentioned steps (4) through (6) are performed automattoally 
using the program ADAM&EVE. 

4. A method of predicting biological functions of proteins using a three-dimensional structure database storing one or 
more bio-active compounds which bind to target proteins with known btotogical functions. 

5. A method of predicting biological functions of proteins by exploring shapes and properties of ligand binding site of 
the query protein using one or more bio-active compounds which bind to target protans with known biological func- 
tions whrch are stored in a three-dimensional structure database. 

6. A method of predicting functions of query proteins by extracting ligand candidates of the query protein from a three- 
dimensional structure database storing one or more bio-active compounds which bind to target proteins with known 
biological functions. 
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7. A three-dimensional structure database storing one or more bio-active oompouncte wtiich bind to a target protein 
with known biological functions, which is used In any one of methods according to daims 1 through 6. 

8. The database aocordng to daim 7 Induding information about the target protein for each bio-active compound. 

9. A method of predicting biological functions of query proteins whose steric structures are known or p-edictable. 
using a three<iimensional structure database storing one or more intrinsic bio-active compounds with known bio- 
activities In organisms but without the knowledge of the target protein, which comprises the steps of: 

(1 ) extracting bio-active compounds capable of binding to the query protein as ligand candidate from the data- 
base based on the capability of complex fbmiation between the query protein and the blo-active compound; 
and 

(8) predicting that biological functions of the query protein concern the bio-activity of said ligand. 

10. The method according to claim 9, which further comprises the steps of: 

(3) extracting one or more ligand binding sites for the query protein; 

(4) exploring the most stable complex formed with the ligand binding site of the query protein for each bio- 
active compound induded in the database; 

(5) extracting bio-active compounds which satisfy hit conditions preset based on the stability and structural 
features of the moat stable complex; 

(6) extracting further, as required, blo-active compounds from the bio-active compounds extracted in step (5) 
which satisfy hit conditions different from those in the above-mentioned step (5); and 

(9) predicting that biological functions of the query protein concern tiie bio-activities of the ligands, while treat- 
ing the bio-active compounds capable of complex formation extracted in above-mentioned steps (5) or (6) as 
ligand candidates. 

11. The method according to claim 10. wherein the abovementioned steps (4) through (6) are performed automatically 
using the program ADAM&EVE. 

12. A method of predicting biological functions of query proteins whose steric structures are known or predictable, 
using a tiiree-dimensional structure database storing one or more intrinsic bio-active compounds with known bio- 
activities in organisms but witiiout the knowledge of the target protein. 

13. A method of predicting biological functions of query proteins by exploring shapes and properties of ligand binding 
sites of tiie query protein by using one or more Intrinsic bio-active compounds with known bio-activities in organ- 
isms but witiiout tiie knowledge of tiie target protein which are stored in a three-dimensional structure database. 

14. A mettiod of predicting biological functions of query proteins by extracting ligand candidates for tiie query protein 
from a three-dimensional structure database storing one or more Intrinsic blo-active compounds witti known bio- 
activities In organisms but witiiout the knowledge of tiie target protein. 

15. A three-dimensional structure database storing one or more intrinsic bio-active compounds with known bio-activi- 
ties in organisms but without tiie knowledge of ttie target protein, which is used in any one of tiie methods accord- 
ing to claims 9 through 13. 




11 




12 



EP 1008 572 A1 



Fig. 2 




13 



EP1008S72A1 



INTERNATIONAL SEARCH REPORT 


Internaiioail sppiicitioo Na 




PCT/JP98/02986 



A. CLASSIFICAHON OF SUBJECT MATTER 



Int. CI' C07B61/00, 606P15/40, G06F17/50, G06P17/30, C07K1/00 

Aoopnitng to Inicnmiontt Picnt Qissifiation (gQ cr lo both mtiooti dattificitioo md IPC 

B. FIELDS SEARCHED 

Minimam docutnentation suiched (ditttfkattan t^fstetn followed by dmtfkatiQa syreboU) 

Int.Cl* C07B61/00, G06F15/40, G06F17/50, G06F17/30, C07K1/00 



Documentalion teaidked ocher than arinimnm doanieiiUtioo to the extent thitsudi documenU are loduded mthe fieldifcirdied 



Electronic dflU base oMsutted during the intenuHonal tecidi (name of data base and, where practicable, aeardi lenns uwd) 
BIOSIS PREVIEWS 



C DOCUMENTS CONSIDERED TO ^RELEVANT 



Caicgory* 


Qutian of document, with indication, where appropriate, of the relevant passages 


Relevant 10 daim No. 


P, A 


WO, 97/24301, Al (ITAI, Akllco), 
10 July, 1997 (10. 07. 97) 
& AU, 9711528, B 


1-6, 


9-14 


A 


WO, 96/13785, Al (ITAI, Akiko), 
9 Nay, 1996 (09. 05. 96) 
k EP| 790567, Al 


1-6, 


9-14 


A 


WO, 93/20525, Al (ITAI, Akiko), 

14 October, 1993 (14. 10. 93) 

& EP, 633534, Al & US, 5642292, A 


1-6, 


9-14 


A 


ITAI, A. et al., "Rational Automatic Search Method 
for Stable Docking Models of Protein and Ligand*, 
J. Mol. Biol., (1994) 243(2) p. 310-326 


1-6, 


9-14 


A 


TOMIOKA, N. et al., "GREEN: A Program Package for 
Docking Studies in Rational Drug Design", 
J. Computer-Aided Mol. Design, (1994) 8(4) 
p.347-366 


1-6, 


9-14 



pc] Further documenls are listed in the continuation of BokC Q See patent family annex. 



* SpedalcucfNkeofdiedi 
*A* doauMaiddlaiatitetRieralaikoriteaitwfeiobliMC 

cnaUavd to lie d partoolaf fdevtaoB 
*r caflierdoceflKaibalpsbiidiBdoaorarieriteiaiaialieealCUacdMe 
V docuMwhkbinBylhrawd9«bttoapcleriiyclaifl!<a)erwhidib 

cited 10 cttibtiak dK pvblkatl«e dale d aoMhei dliliQa or ote 

^cdd ccaaoa (M^cdiicd) 

-o" 

T* doaiMpOMlM prior (oiteiatcmti^ 
the priority dMB daisied 



T UkrdocuaicalpghliafaBdatlcf the i »i cr uaih>ari fiUac date crprioriiy 
daiaaadaoiiaooaflia «itli dia appUoatbe bai dnd lo Ofldcmad 
te pcittcipli or theory vadcrtybig ika iQVMiiae 

"X* docBaamdpHiicBlariakniaocchedaimdiflMatioeGav^ 

oooiiderBd oMd or caoaol bo oooaidefed to iovolvo aa iflvcative rtep 




>datrtv toMtea pwo»dflladhiihBaii 



Date of the actual oooipletion of the international teardi 

29 September, 1998 (29. 09. 98) 


Date of mailing of the international seardt report 

13 October, 1998 (13. 10. 98) 


Name and mailing address of the tSA/ 
Japanese Patent Office 

Facsimile No. 


Authorised oCRoer 
Telephone Na 



Fbmi PCr/ISA/210 (secood sheet) (July 1992) 



14 



EP1008 572A1 



INTERNATIONAL SEARCH REPORT 



faitcnutiaaiJ applicitioa Na 
9CT/JP98/02986 



C(Goiitmu«tioQ). DOCUMENTS CONSIDBRED TO BE RELEVANT 



Olegory* 



CtttioQ of doounent, with indtctlion, where tpproprlate, of the relevant pass^es 



Rekvuit to datm Na 



YAMADA, M. et al., "Development of an Efficient 
Automated Doclcing Method" , Chem. Pharm. Bull. , (1993) 
41(6) p. 1200-1202 

YAMADA, M. et al., "Application and Evaluation of the 
Automated Docking Method' , Chem. Pharm. Bull. , ( 1993) 
41(6) p. 1203-1205 

JP, 2-200641, A (Mochida Pharmaceutical Co. , Ltd. ) , 
8 August, 1990 (08. 08. 90) (Family: none) 

KATCHALSKI-KATZIR, E. et al., "Molecular Surface 
Recognition Determination of Geometric Fit between 
Proteins and their Ligands by Correlation 
Techniques", Proc. Katl. Acad. Sci. USA, (1992) 89(6) 
p. 2195-2199 

SCHERZ, M.W. "Synthesis and Structure-Activity 
Relationships of N,N'-di-o-Tolylguanidine Analogues 
High-Affinity Ligands for the Haloperidol-Sensitive 
Sigma Receptor", J. Med. Chem., (1990) 33(9) 
p. 2421-2429 

NISHIBATA, y. "Automatic Creation of Drug Candidate 
Structures Based on Receptor Structure. Starting 
Point for Artifitial Lead Generation", Tetrahedron, 
(1991) 47(43) p. 8985-8990 

GALLOP, M.A. et al., "Applications of Combinatorial 
Technologies to Drug Discovery. I. Background and 
Peptide Combinatorial Libraries", J. Med. Chem., 
(1994) 37(9) p. 1233-1251 



1-6, 9-14 

1-6, 9-14 

1-6, 9-14 

1-6, 9-14 

1-6, 9-14 

1-6, 9-14 

1-6, 9-14 



Form PCT/ISA/210 (cootiouatioo of seoood sheet) (July 1992) 



15 




EP1008572A1 




I^^^ERNATIONAL SEARCH REPORT 



Intemaiiaotl a|iplicitii» Na 
PCT/JP98/02986 



Box I Observations whcrr ccrUia daina wen found imscarduble (Continuation of Item 1 of (Int thcti) 

Thif internatioaa] search report has not been established in respect of certain dainu under Artide n(2Xi) for the following reasons: 



because they (date to subject matter not required to be seaidied by this Authaity, namely: 

Claims If 8 and 15 pertain to data bases and are therefore considered 
mere presentations of information. Thus, they relate to a subject matter 
which this International Searching Authority is not required, under the 
provisions of Article 17(2) (a) (i) of the PCT and Rule 39.1(iv) of the 

2. Q aaiasNs.: 

becuise they relate to parti of the intematioaal application that do not ooniply with the prescribed requiiements to sudi an 
extent that no meaningful inieroational Mardi can be carried out, spedfically: 



3. Q QainttNoa.: 

I»ecause they are dependent daims and ire not drafted in aooordanoe with the second and third sentences of Rule 6.4(a). 

Box II Obscrvatlom where unity at invention is lacking (Continuation of item 2 of Hn t sheet) 

This International Searching Authority found multiple inventions in this intemaiional applicatioii, as follaws: 



1« As all required additional search Cees were timely paid by the applicani, this international seardi report covers all 
searchable daims. 

2- As all searchable daims could be searched without effort justifying an additional fee, this Authority did not invite payment 
of an J additional fee. 

3. Q As only some of the required additional search fees were timdy paid by the applicant, this international search report covers 
only those daims for which fees were paid, spedfically daims Nos^ 



4. Q No required addiliondicaich fees itm timdy paid by the applicam.Cansequemly.ito 
restricted to the invcotioo first mentioned in the duinr, it is oorered by daimi Nob.: 

Rmark on PnCcst Q Ihe «klitional search fees were aooompanied by the applicam's protest. 



1. [7] QaimsNoi.: 1, IS 



n No protest aooompanied the payment of additional aearch fees. 



Form PCT/ISA/210 (cootinuittoa of first sheet (1» (July 1992) 



16 



EP1008572A1 



INTERNATIONAL SEARCH REPORT 



Intcmukxul appHcatioa Na 
PCT/JP98/02986 



Continuation of Box Wo, I of continuation of firs t sheet t\\ 
Regulations under the PCT, to search. 



Fonn PCTASA/210 (extra sheet) (July 1992) 



17 



